2,079 research outputs found

    Recurring Query Processing on Big Data

    Get PDF
    The advances in hardware, software, and networks have enabled applications from business enterprises, scientific and engineering disciplines, to social networks, to generate data at unprecedented volume, variety, velocity, and varsity not possible before. Innovation in these domains is thus now hindered by their ability to analyze and discover knowledge from the collected data in a timely and scalable fashion. To facilitate such large-scale big data analytics, the MapReduce computing paradigm and its open-source implementation Hadoop is one of the most popular and widely used technologies. Hadoop\u27s success as a competitor to traditional parallel database systems lies in its simplicity, ease-of-use, flexibility, automatic fault tolerance, superior scalability, and cost effectiveness due to its use of inexpensive commodity hardware that can scale petabytes of data over thousands of machines. Recurring queries, repeatedly being executed for long periods of time on rapidly evolving high-volume data, have become a bedrock component in most of these analytic applications. Efficient execution and optimization techniques must be designed to assure the responsiveness and scalability of these recurring queries. In this dissertation, we thoroughly investigate topics in the area of recurring query processing on big data. In this dissertation, we first propose a novel scalable infrastructure called Redoop that treats recurring query over big evolving data as first class citizens during query processing. This is in contrast to state-of-the-art MapReduce/Hadoop system experiencing significant challenges when faced with recurring queries including redundant computations, significant latencies, and huge application development efforts. Redoop offers innovative window-aware optimization techniques for recurring query execution including adaptive window-aware data partitioning, window-aware task scheduling, and inter-window caching mechanisms. Redoop retains the fault-tolerance of MapReduce via automatic cache recovery and task re-execution support as well. Second, we address the crucial need to accommodate hundreds or even thousands of recurring analytics queries that periodically execute over frequently updated data sets, e.g., latest stock transactions, new log files, or recent news feeds. For many applications, such recurring queries come with user-specified service-level agreements (SLAs), commonly expressed as the maximum allowed latency for producing results before their merits decay. On top of Redoop, we built a scalable multi-query sharing engine tailored for recurring workloads in the MapReduce infrastructure, called Helix. Helix deploys new sliced window-alignment techniques to create sharing opportunities among recurring queries without introducing additional I/O overheads or unnecessary data scans. Furthermore, Helix introduces a cost/benefit model for creating a sharing plan among the recurring queries, and a scheduling strategy for executing them to maximize the SLA satisfaction. Third, recurring analytics queries tend to be expensive, especially when query processing consumes data sets in the hundreds of terabytes or more. Time sensitive recurring queries, such as fraud detection, often come with tight response time constraints as query deadlines. Data sampling is a popular technique for computing approximate results with an acceptable error bound while reducing high-demand resource consumption and thus improving query turnaround times. In this dissertation, we propose the first fast approximate query engine for recurring workloads in the MapReduce infrastructure, called Faro. Faro introduces two key innovations: (1) a deadline-aware sampling strategy that builds samples from the original data with reduced sample sizes compared to uniform sampling, and (2) adaptive resource allocation strategies that maximally improve the approximate results while assuring to still meet the response time requirements specified in recurring queries. In our comprehensive experimental study of each part of this dissertation, we demonstrate the superiority of the proposed strategies over state-of-the-art techniques in scalability, effectiveness, as well as robustness

    Generalized Area Spectral Efficiency: An Effective Performance Metric for Green Wireless Communications

    Full text link
    Area spectral efficiency (ASE) was introduced as a metric to quantify the spectral utilization efficiency of cellular systems. Unlike other performance metrics, ASE takes into account the spatial property of cellular systems. In this paper, we generalize the concept of ASE to study arbitrary wireless transmissions. Specifically, we introduce the notion of affected area to characterize the spatial property of arbitrary wireless transmissions. Based on the definition of affected area, we define the performance metric, generalized area spectral efficiency (GASE), to quantify the spatial spectral utilization efficiency as well as the greenness of wireless transmissions. After illustrating its evaluation for point-to-point transmission, we analyze the GASE performance of several different transmission scenarios, including dual-hop relay transmission, three-node cooperative relay transmission and underlay cognitive radio transmission. We derive closed-form expressions for the GASE metric of each transmission scenario under Rayleigh fading environment whenever possible. Through mathematical analysis and numerical examples, we show that the GASE metric provides a new perspective on the design and optimization of wireless transmissions, especially on the transmitting power selection. We also show that introducing relay nodes can greatly improve the spatial utilization efficiency of wireless systems. We illustrate that the GASE metric can help optimize the deployment of underlay cognitive radio systems.Comment: 11 pages, 8 figures, accepted by TCo

    Inexact Bregman Proximal Gradient Method and its Inertial Variant with Absolute and Relative Stopping Criteria

    Full text link
    The Bregman proximal gradient method (BPGM), which uses the Bregman distance as a proximity measure in the iterative scheme, has recently been re-developed for minimizing convex composite problems \textit{without} the global Lipschitz gradient continuity assumption. This makes the BPGM appealing for a wide range of applications, and hence it has received growing attention in recent years. However, most existing convergence results are only obtained under the assumption that the involved subproblems are solved \textit{exactly}, which is not realistic in many applications. For the BPGM to be implementable and practical, in this paper, we develop inexact versions of the BPGM by employing either an absolute-type stopping criterion or a relative-type stopping criterion solving the subproblems. The iteration complexity of O(1/k)\mathcal{O}(1/k) and the convergence of the sequence are also established for our iBPGM under some conditions. Moreover, we develop an inertial variant of our iBPGM (denoted by v-iBPGM) and establish the iteration complexity of O(1/kγ)\mathcal{O}(1/k^{\gamma}), where γ1\gamma\geq1 is a restricted relative smoothness exponent. When the smooth part in the objective has a Lipschitz continuous gradient and the kernel function is strongly convex, we have γ=2\gamma=2 and thus the v-iBPGM improves the iteration complexity of the iBPGM from O(1/k)\mathcal{O}(1/k) to O(1/k2)\mathcal{O}(1/k^2), in accordance with the existing results on the exact accelerated BPGM. Finally, some preliminary numerical experiments for solving the discrete quadratic regularized optimal transport problem are conducted to illustrate the convergence behaviors of our iBPGM and v-iBPGM under different inexactness settings

    A Complete Reference of the Analytical Synchrotron External Shock Models of Gamma-Ray Bursts

    Full text link
    Gamma-ray bursts are most luminous explosions in the universe. Their ejecta are believed to move towards Earth with a relativistic speed. The interaction between this "relativistic jet" and a circum burst medium drives a pair of (forward and reverse) shocks. The electrons accelerated in these shocks radiate synchrotron emission to power the broad-band afterglow of GRBs. The external shock theory is an elegant theory, since it invokes a limit number of model parameters, and has well predicted spectral and temporal properties. On the other hand, depending on many factors (e.g. the energy content, ambient density profile, collimation of the ejecta, forward vs. reverse shock dynamics, and synchrotron spectral regimes), there is a wide variety of the models. These models have distinct predictions on the afterglow decaying indices, the spectral indices, and the relations between them (the so-called "closure relations"), which have been widely used to interpret the rich multi-wavelength afterglow observations. This review article provides a complete reference of all the analytical synchrotron external shock afterglow models by deriving the temporal and spectral indices of all the models in all spectral regimes, including some regimes that have not been published before. The review article is designated to serve as a useful tool for afterglow observers to quickly identify relevant models to interpret their data. The limitations of the analytical models are reviewed, with a list of situations summarized when numerical treatments are needed.Comment: 119 pages, 45 figures, invited review accepted for publication in New Astronomy Review

    The extension of variability properties in gamma-ray bursts to blazars

    Full text link
    Both gamma-ray bursts (GRBs) and blazars have relativistic jets pointing at a small angle from our line of sight. Several recent studies suggested that these two kinds of sources may share similar jet physics. In this work, we explore the variability properties for GRBs and blazars as a whole. We find that the correlation between minimum variability timescale (MTS) and Lorentz factor, Γ\Gamma, as found only in GRBs by Sonbas et al. can be extended to blazars with a joint correlation of MTSΓ4.7±0.3\rm MTS\propto\Gamma^{-4.7\pm0.3}. The same applies to the MTSLγ1.0±0.1\rm MTS\propto \it L_{\gamma}^{\rm -1.0\pm0.1} correlation as found in GRBs, which can be well extended into blazars as well. These results provide further evidence that the jets in these two kinds of sources are similar despite of the very different mass scale of their central engines. Further investigations of the physical origin of these correlations are needed, which can shed light on the nature of the jet physics.Comment: 6 pages, 2 figures, accepted for publication in MNRA
    corecore